Use ingestion-client in the `Shuffler` #4024

muhamadazmy · 2025-11-17T17:10:37Z

Use ingestion-client in the Shuffler

Avoid direct writes to bifrost in shuffler by using a
dedicated ingestion-client instance.

Stack created with Sapling. Best reviewed with ReviewStack.

tillrohrmann

Thanks a lot for replacing the direct Bifrost write with the IngressClient in the Shuffle @muhamadazmy. Maybe the name IngressClient does not fit 100% given that now also the Shuffle uses it. Maybe something like IngestionClient or so works better. Given that we don't use the send window of the IngressClient yet, I wouldn't expect a different runtime behavior of the shuffle. Once we have this, I would be interested in how the overall shuffle throughput increases by using the IngressClient.

I left a few minor comments for your consideration.

crates/worker/src/lib.rs

crates/worker/src/partition/leadership/mod.rs

tillrohrmann · 2025-11-25T22:31:26Z

crates/worker/src/partition/shuffle.rs

+                    ingress
+                        .ingest(
+                            msg.partition_key(),
+                            IngestRecord::from_parts(msg.record_keys(), msg),
+                        )
+                        .await?;


As a follow-up to this PR we should make use of being able to send more than a single record at a time to maximize our throughput. I guess this will require a overhaul of the Shuffle component. I think quite a few things can be simplified here (no more pin projecting if we don't require shuffle_next_message to run in the select arm, etc.).

I totally agree.

tillrohrmann · 2025-11-25T22:39:49Z

crates/worker/src/lib.rs

+            networking.clone(),
+            Metadata::with_current(|m| m.updateable_partition_table()),
+            partition_routing.clone(),
+            NonZeroUsize::new(5 * 1024 * 1024).unwrap(), // 5MB


How to best size the buffer window to fully utilize the network connections between the partition processors? Should this be something along the lines of RTT * bandwidth * #nodes * 2 to be able to keep all connections fully utilized?

Probably also a good idea to make this configurable.

That's the buffer for the inflight records which is shared across partition sessions. It eventually should be different from the partition session chunk size (work in progress), which is the one that has to respect max network request size

tillrohrmann · 2025-11-25T22:43:37Z

There seem to be a few test failures on GHA.

tillrohrmann · 2025-11-26T08:35:03Z

crates/worker/src/partition/shuffle.rs

+                    ingress
+                        .ingest(
+                            msg.partition_key(),
+                            IngestRecord::from_parts(msg.record_keys(), msg),
+                        )
+                        .await?;


What about support for rolling upgrades?

ingest does not fail unless the ingession client is closed. This means worst case is that it will block until leaders are responsive.

I think to avoid this situation it's possible we release support for the Ingest messages in PP first before actually using them in the following release.

- `ingestion-client` implements the runtime layer that receives WAL envelopes, fans it out to the correct partition, and tracks completion. It exposes: - `IngestionClient`, enforces inflight budgets, and resolves partition IDs before sending work downstream. - The session subsystem that batches `IngestRecords`, retries connections, and reports commit status to callers. - `ingestion-client` only ingests records and notify the caller once the record is "committed" to bifrost by the PP. This makes it useful to implement kafka ingress and other external ingestion

Summary: Handle the incoming `IngestRequest` messages sent by the `ingestion-client`

Summary: Refactor ingress-kafka to leverage on `ingestion-client` implementation. This replaces the previous direct write to bifrost which allows: - Batching, which increases throughput - PP becomes the sole writer of its logs (WIP restatedev#3965)

- Use IngestionClient instead of bifrost to write to partitions logs - Remove deprecated `delete_invocation`

Summary: This PR makes sure cleaner does not do an external bifrost write by using creating a cleaner effect stream that can be handled directly by the PP event loop

Avoid direct writes to bifrost in shuffler by using a dedicated ingestion-client instance.

muhamadazmy force-pushed the pr4024 branch from 819226a to d433166 Compare November 18, 2025 14:50

muhamadazmy changed the title ~~[wip] shuffler with ingress client~~ Use ingress-client in the Shuffler Nov 18, 2025

muhamadazmy marked this pull request as ready for review November 18, 2025 14:51

muhamadazmy requested a review from tillrohrmann November 18, 2025 14:51

muhamadazmy force-pushed the pr4024 branch 2 times, most recently from 8008c9c to 91f6046 Compare November 25, 2025 12:54

tillrohrmann approved these changes Nov 25, 2025

View reviewed changes

tillrohrmann reviewed Nov 26, 2025

View reviewed changes

muhamadazmy force-pushed the pr4024 branch 15 times, most recently from d6e9955 to 8c0797e Compare December 2, 2025 09:00

muhamadazmy force-pushed the pr4024 branch 4 times, most recently from d14fb10 to e1b4c8e Compare December 2, 2025 14:50

muhamadazmy changed the title ~~Use ingress-client in the Shuffler~~ Use ingestion-client in the Shuffler Dec 2, 2025

muhamadazmy mentioned this pull request Dec 2, 2025

Ensure Partition Processor Leader Is the Sole Log Writer (for vqueues support) #3965

Open

5 tasks

muhamadazmy force-pushed the pr4024 branch from e1b4c8e to f9e84c8 Compare December 3, 2025 10:59

muhamadazmy marked this pull request as draft December 3, 2025 10:59

muhamadazmy force-pushed the pr4024 branch 8 times, most recently from f03e3c2 to 078d17d Compare December 4, 2025 10:43

muhamadazmy marked this pull request as ready for review December 4, 2025 10:44

muhamadazmy requested a review from tillrohrmann December 4, 2025 10:44

muhamadazmy force-pushed the pr4024 branch from 078d17d to f9f6240 Compare December 4, 2025 10:57

muhamadazmy added 7 commits December 4, 2025 11:58

[bifrost] Get a CommitToken back from notify_committed()

eeb457f

[PP] Handle IngestRequest message

ee27f2d

Summary: Handle the incoming `IngestRequest` messages sent by the `ingestion-client`

[AdminAPI] Use IngestionClient for invocation and state mgmt

0c3a05d

- Use IngestionClient instead of bifrost to write to partitions logs - Remove deprecated `delete_invocation`

[Cleaner] remove the cleaner external bifrost writer

8b37c97

Summary: This PR makes sure cleaner does not do an external bifrost write by using creating a cleaner effect stream that can be handled directly by the PP event loop

Use ingestion-client in the Shuffler

ac8a535

Avoid direct writes to bifrost in shuffler by using a dedicated ingestion-client instance.

muhamadazmy force-pushed the pr4024 branch from f9f6240 to ac8a535 Compare December 4, 2025 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use ingestion-client in the `Shuffler` #4024

Use ingestion-client in the `Shuffler` #4024

Uh oh!

muhamadazmy commented Nov 17, 2025 •

edited

Loading

Uh oh!

tillrohrmann left a comment

Uh oh!

Uh oh!

Uh oh!

tillrohrmann Nov 25, 2025

Uh oh!

muhamadazmy Nov 26, 2025

Uh oh!

tillrohrmann Nov 25, 2025

Uh oh!

muhamadazmy Nov 26, 2025

Uh oh!

tillrohrmann commented Nov 25, 2025

Uh oh!

tillrohrmann Nov 26, 2025

Uh oh!

muhamadazmy Nov 26, 2025

Uh oh!

muhamadazmy Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use ingestion-client in the Shuffler #4024

Are you sure you want to change the base?

Use ingestion-client in the Shuffler #4024

Uh oh!

Conversation

muhamadazmy commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tillrohrmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tillrohrmann Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

muhamadazmy Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

tillrohrmann Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

muhamadazmy Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

tillrohrmann commented Nov 25, 2025

Uh oh!

tillrohrmann Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

muhamadazmy Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

muhamadazmy Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use ingestion-client in the `Shuffler` #4024

Use ingestion-client in the `Shuffler` #4024

muhamadazmy commented Nov 17, 2025 •

edited

Loading